Properties of the word set for estimating similarities between prokaryotic genomes in linguistic approach
نویسندگان
چکیده
Recently, as completely sequenced genomes have been rapidly increasing in number, comparison between whole genome sequences is becoming more important. Linguistic approach is one of the available methods to estimate the similarities between long sequences such as whole genomes[1]. In the method, a word set W is constructed, in which a word is defined as a sequence piece of four letters of nucleotides with fixed length, and the frequency of appearance of each word in W is calculated throughout a genome. The similarity between genomes is estimated by comparing the distributions of the frequencies of appearance calculated for the genomes based on such as the Kendall’s rank correlation. In performing the linguistic approach, we must predetermine three properties of W : the word length L, the size n, and the contents. In the previous study[2], we obtained the result, by analyzing the word diversities in prokaryotic genomes, that L = 8 ∼ 12 is appropriate for prokaryotic species. In this study, we investigate the other two properties, the size and the contents of W adequate for analyzing the similarity between prokaryotic genomes.
منابع مشابه
The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering
Objectives: Stuttering is one of the most prevalent speech and language disorders. Symptomology of stuttering has been surveyed from different aspects such as biological, developmental, environmental, emotional, learning and linguistic. Previous researches in English-speaking people have suggested that some linguistic features such as word meanings may play a role in the frequency of speech non...
متن کاملArithmetic Aggregation Operators for Interval-valued Intuitionistic Linguistic Variables and Application to Multi-attribute Group Decision Making
The intuitionistic linguistic set (ILS) is an extension of linguisitc variable. To overcome the drawback of using single real number to represent membership degree and non-membership degree for ILS, the concept of interval-valued intuitionistic linguistic set (IVILS) is introduced through representing the membership degree and non-membership degree with intervals for ILS in this paper. The oper...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملA Comparison of Relationship between Text and Picture in the Selected Iranian and Contemporary American-European Illustrated-Fiction Books Based on the Theory of Maria Nikolajeva and Carole Scott
Illustrated-fiction books are special forms of art that are the combination of text and picture. The relationship between text and picture in this genre is diverse and variegated, and has different effects on the audience; however, little research has been done about it. The goal of this research is to compare text/picture relationship in the selected Iranian and contemporary American-European ...
متن کاملIranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations
The present investigation was designed to study the production and comprehension of specific means for information highlighted by advanced Iranian learners of English as a Foreign Language. The study focused on the discourse-pragmatically motivated variations of the basic word order such as inversion, pre-posing, it- and Wh-clefts. After taking the Nelson test, a homogeneous group was settled. ...
متن کامل